Using Parsed Corpora for Structural Disambiguation in the TRAINS Domain

ثبت نشده
چکیده

This paper describes a prototype disambiguation module KANKEI which uses two corpora of the TRAINS project In ambiguous verb phrases of form V NP PP or V NP adverb s the two corpora have very di erent PP and adverb attachment patterns in the rst the correct attachment is to the VP of the time while in the second the correct attachment is to the NP of the time KANKEI uses various n gram patterns of the phrase heads around these ambiguities and assigns parse trees with these ambiguities a score based on a linear combination of the frequencies with which these patterns appear with NP and VP attachments in the TRAINS corpora Unlike previous statistical disambiguation systems this technique thus combines evidence from bigrams trigrams and the gram around an ambiguous attachment In the current experiments equal weights are used for simplicity but results are still good on the TRAINS corpora and accuracy Despite the large statistical di erences in attachment preferences in the two corpora training on the rst corpus and testing on the second gives an accuracy of These results suggest that our technique captures attachment patterns that are useful across corpora This work was supported in part by National Science Foundation grant IRI James Allen s helpful suggestions are gratefully acknowledged

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Parsed Corpora for Structural Disambiguation in the TRAINS Domain

This paper describes a prototype disambiguation module, KANKEI, which was tested on two corpora of the TRAINS project. In ambiguous verb phrases of form V ... NP PP or V ... NP adverb(s), the two corpora have very different PP and adverb a t tachment patterns; in the first, the correct a t tachment is to the VP 88.7% of the time, while in the second, the correct attachment is to the NP 73.5% of...

متن کامل

Acquisition et évaluation sur corpus de propriétés de sous-catégorisation syntaxique

We carry out an experiment aimed at using subcategorization information into a syntactic parser for PP attachment disambiguation. The subcategorization lexicon consists of probabilities between a word (verb, noun, adjective) and a preposition. The lexicon is acquired automatically from a 200 million word corpus, that is partially tagged and parsed. In order to assess the lexicon, we use four di...

متن کامل

Learning domain theories

By a ‘domain theory’ we mean a collection of facts and generalisations or rules which capture what commonly happens (or does not happen) in some domain of interest. As language users, we implicitly draw on such theories in various disambiguation tasks, such as anaphora resolution and prepositional phrase attachment, and formal encodings of domain theories can be used for this purpose in natural...

متن کامل

The TreeBanker: a Tool for Supervised Training of Parsed Corpora

I describe the TreeBanker, a graphical tool for the supervised training involved in domain customization of the disambiguation component of a speechor languageunderstanding system. The TreeBanker presents a user, who need not be a system expert, with a range of properties that distinguish competing analyses for an utterance and that are relatively easy to judge. This allows training on a corpus...

متن کامل

Semantic Feature Engineering for Enhancing Disambiguation Performance in Deep Linguistic Processing

The task of parse disambiguation has gained in importance over the last decade as the complexity of grammars used in deep linguistic processing has been increasing. In this paper we propose to employ the fine-grained HPSG formalism in order to investigate the contribution of deeper linguistic knowledge to the task of ranking the different trees the parser outputs. In particular, we focus on the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009